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Abstract: In this paper we consider the trace regression model where n 
entries or linear combinations of entries of an unknown mi X m2 matrix 
Ao corrupted by noise are observed. We establish for the nuclear-norm pe- 
nalized estimator of Aq introduced in [13] a general sharp oracle inequality 
with the spectral norm for arbitrary values of n,mi,m2 under an inco- 
herence condition on the sampling distribution 11 of the observed entries. 
Then, we apply this method to the matrix completion problem. In this case, 
we prove that it satisfies an optimal oracle inequality for the spectral norm, 
thus improving upon the only existing result [13] concerning the spectral 
norm, which assumes that the sampling distribution is uniform. Note that 
our result is valid, in particular, in the high-dimensional setting m\m2 2> n. 
Finally we show that the obtained rate is optimal up to logarithmic factors 
in a minimax sense. 
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1. Introduction 

Consider n independent observations [Xi,Yi)^i = 1, . . . ,n, satisfying the trace 
regression model: 

= tr(X7Ao)+6, ^=l,...,n, (1.1) 

where Xi are random matrices with dimensions mi x m2, Yi are random variables 
in R, Aq g JJ™! x™2 miknown matrix, ~ 1, . . . , ?i are i.i.d. zero mean 

random variables with fT| = E^^ < oo and tr(i3) denotes the trace of matrix 
B. We consider the problem of estimation of Aq based on the observations 
(Xi.Yi), i = l,...,n. 

For any matrices A, B E ]U™ix™2^ .^^g define the scalar products 

{A,B)^tT{A^B), 
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and 

n 

Here H = i X]r=i ^i, where Hi denotes the distribution of Xi. The correspond- 
ing norm ||A||L3(n) is given by 

1 " 

ll^llL(n) = -ElE((AXa- 

Example 1: Matrix completion. Let the design matrices Xi be i.i.d. with 
distribution 11 on the set 

X ^ {ej(mi)ej(m2), 1 < J < mi, 1 < fc < j^s} , (1.2) 

where ek{m) are the canonical basis vectors in R™. The set X forms an or- 
thonormal basis in the space of mi x m2 matrices that will be called the matrix 
completion basis. Let also n < mim2. Then the problem of estimation of Aq 
coincides with the problem of matrix completion with random sampling distri- 
bution n. Existing results typically assume that H is the uniform distribution 
on X. See, for instance, [9, 19] for the non-noisy case (^^ = 0, i = 1, . . . , n) and 
[13] for the noisy case and the references cited therein. In several applications, 
like the Netflix problem, the distribution H is not necessarily uniform on X. 
We will show that optimal estimation of is possible in this context under a 
weaker set of conditions as compared to those used in [7, 9, 19]. One can also 
consider other matrix measurement models. For instance, ]10] considers sam- 
pling without replacement in the set X defined in (1.2) and ]11] investigates 
several orthonormal families in the context of Quantum tomography. 

Example 2. Column masks. Let the design matrices Xi be independent 
matrices, which have only one nonzero column. The trace regression model can 
be then reformulated as a longitudinal regression model, with different distribu- 
tions of Xi corresponding to different tasks; see ]1, 15, 21] for more details and 
the references cited therein. 



Example 3. "Complete" subgaussian design. Let the design matrices 
Xi are i.i.d. replications of a random matrix X such that {A, X) is a subgaussian 
random variable for any A e R™i ^ xhis approach originates from compressed 
sensing, where typically the entries of X are either i.i.d. standard Gaussian 
or Rademacher random variables. The problem of exact reconstruction of Aq 
under such a design in the non- noisy setting was studied in ]5, 16, 20], whereas 
estimation of in the presence of noise is analyzed in ]5, 16, 21], among which 
]5, 11, 21] treat the high-dimensional case mim2 > n. 

We consider the following procedure introduced recently in ]13] 

= argmin^gR„,x^, \ P||i^(n) - ( ^ E ^ ) + ^Pll i [ , (1-3) 
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where A > is the regularization parameter and ||A||i is the nuclear norm of A. 

In the matrix completion problem, the sampling scheme is typically assumed 
to be uniform on X and this assumption is crucial to establish the exact re- 
covery in the noiseless case or to derive the optimal rates of estimation with 
the Frobenius norm in the setting n < mim2', see for instance [6, 9, 13] and 
the references cited therein. However, in several applications such as the Netflix 
problem, the practitioner does not choose the sampling scheme and the observed 
entries of Ao are not guaranteed to follow the uniform distribution. Therefore, 
the existing exact recovery or estimation results do not cover this situation. 

In this paper, we concentrate mainly on the matrix completion problem. Our 
contributions are the following. First, we establish for the estimator (1.3) the 
following result. If Aq is low rank, 11 satisfies an incoherence condition and, in 
addition, some additional mild conditions are satisfied, then we have for any 
t > with probability at least 1 — 



where C > is a numerical constant, a is a bound on the absolute values of the 
entries of Aq and || • \\ao is the spectral norm. Second, we show that the above 
rate is optimal (in the minimax sense) up to logarithmic factors on a particular 
class of low rank matrices. 

Note that the existing estimation results concern usually the Frobenius norm 
[5, 10, 11, 18]. The only existing estimation result for the spectral norm is due 
to ]13] which assumes that the entries are sampled uniformly at random. In this 
case, the estimator (1.3) can be computed directly by soft-thresholding of the 
singular values in the SVD of X = ^^^^ F-X, (see Equation (3.2) in [13]). 

Exploiting this explicit simple form, ]13] established (1.4) for the procedure 
(1.3). This approach does not generalize to other sampling distribution 11 since 
(1.3) does not admit an explicit form in general. In this paper, we propose 
an alternative approach to derive for the estimator (1.3) the oracle inequality 
(1.4 when the sampling distribution 11 satisfies an incoherence condition, which 
covers in particular the case of uniform sampling 11 and also holds in more 
general situations. 

Note finally that the results of this paper are obtained for general settings 
of n, mi, 7712. In particular they are valid in the high-dimensional setting, which 
corresponds to toi7ti2 S> ti, with low rank matrices Aq. 

In section 2, we recall some tools and definitions and establish a preliminary 
result. In Section 3, we establish a general oracle inequality for the spectral norm. 
In Section 4, we apply the general result of the previous section to the matrix 
completion problem and establish the optimality (up to logarithmic factors) of 
(1.3). Finally, Section 5 contains additional material and proofs. 
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2. Tools and preliminary result 

We recall first some basic facts about matrices. Let A S ><™2 rectangular 
matrix, and let r = rank(^) < min(mi,TO2) denote its rank. The singular value 
decomposition (SVD) of A admits the form 



A = Y,a,{A)uf^®vf 



with orthonormal vectors u^^\. . . , itr^-* g M™^ , orthonormal vectors v'"^'^ , ■ • • , vi^'' 

and real numbers cti {A) > ■ ■ ■ > <Jr {A) > (the singular values of A) . The 
pair of linear vector spaces {Si{A), S2{A)) where Si{A) is the linear span of 
{u['^\ . . . , Mr"^"*} and S2{A) is the linear span of {i'^^'', . . . , vi'^^} will be called 
the support of A. We will denote by Sj{A)^ the orthogonal complements of 
Sj{A), j = 1,2, and by Ps the orthogonal projector onto the linear vector 
subspace S of R"'^ , j = 1,2. For any A G A with support {Si, S2), we define 

VAiB) -.= 3- Ps^BPs^, Vi{B) Ps^BPg^, B G K^ix™^ 
The Schatten-p (quasi-)norm \\A\\p of matrix A is defined by 

mm(mi,m2} \ 

E ^j{AY\ forO<p<cx), and \\A\\^^aM)- 

Recall the well-known trace duality property: 

\iv{A^B)\ < ||yl||i||B||oo, VA,B e M"!^"^. 

We will also use the fact that the subdifferential of the convex function A 
ll^lli is the following set of matrices: 

r 

d\\A\\i ^ {Y^u^^ ®v^^'''> + Ps,^A)^WPs,iAy ■■ \\W\U < 1} (2.1) 
(cf. [24]). 

We will need the following quantities introduced in [12] 
Kr = K.(n) := inf {||S2|U,(n) : B e R'"lX"^ ||B||2 = 1, rank(B) < r} 

and 

< = 4(n) :=sup{||S2|U,(n) : ]R'"i^™% ||S||2 = 1, rank(B) < r} . 

These quantities Kr{U) and Kj,(n) measure the "distorsion" on the set of low 
rank matrices between the geometries induced respectively by the ^2(1!) and 
Frobenius norms. 
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We introduce the following measure of coherence 

p = p(n):=sup|^J^^^^ : ^A,BeW^^^^\{A,B)=Q^. (2.2) 

We can now state our incoherence condition. 
Assumption 1. Let Cq > 0, a > 1 and r > 1. We have 



(1 + 2co)ar' 

The quantity p is the natural extension to the matrix case of the incoherence 
measure introduced for the sparse vector case in [8] and further studied in [2, 3, 
14] and the references cited therein. Concerning the matrix completion problem, 
[5, 6, 9, 13] study the case of uniform at random sampling. Assumption 1 is then 
trivially satisfied with p = 0, since we have {A^ B) i^^iji) = :pp[^:p;^{A, B) for any 
A,B e R™ix™2. Note also that [5, 6, 9] need in addition the following condition 
in order to recover Aq in the noiseless case 



. / \i2 2i/r 

max|-Psi,s2(u)|2 < , 

uex mi A m2 



2vr 

< 



(mi A 1712)^ ' 



for some i' > where Sj = Sj{Ao), j — 1,2 and | • I2, | ■ |oo denote respectively 
the I2 and loo vector norms. Although called "incoherence condition" in [9], 
this condition is entirely different from Assumption 1 and we do not need it to 
establish our estimation result. 

In [13], the authors establish an oracle inequality for the L2(J1) norm under a 
condition akin to the restricted eigenvalue condition in sparse vector estimation: 
/Zco(Ao) < 00 for some co > where 

^lcMo) infj^i > : \\VA0m\2 < MllS|lL.(n), VB e C^o,co}, 
and Cao,co is the following cone of matrices 

Cao,co := {b e M™^x'"^ : \\Vi{B)h < co\\VA„m\i} ■ 

Note that fico{Ao) is a nondecreasing function of cq. We establish in Proposition 
1 below that Assumption 1 implies /ico(^o) < rr^/o^' '^^ i'ank(Ao) < r. 

Proposition 1. Let Assumption 1 be satisfied for some Cq > 0, a > 1 and 

r > \. Assume furthermore that ki = ki{TV) > 0. Then, for any A € R^ix^s 
with rank(A) < r, we have 



, , l a 

Mco(^) < —\ 7 < oo- 

Ki V a — I 
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Proof. We have 

\\VA{B)+ri{B)\\l^^^^ = ||7'A(S)|lL(n) + ri(S)llL(n) + 2(7'^(i?), 7'i:(S))^,(n) 

> \\Va{B)\\ i2(n) + i2(n) 

> \\VAm\Uu) - 2p<^o\\VAm\i 

> \\VA{B)\\l^^n)-2pcor\\VA{B)\\l (2.3) 

Next, we treat \\VA{B)\\'j^^^-^y For the sake of brevity, we set r = rank(7'^(i3)) 

and, for any 1 < j < r, aj = ^^(^^(S)), uj = uf'''^^^^ and Vj = vf^^^^\ 
Recall that the SVD of VaIb) is 

r 

Va{B) = ^ <7jUj ® Vj. 
For any B e M"ix™2^ .^^g have 

r 

r r 
r 

> («?-pr)5]af = («?-pr)|l7'^(i3)|li (2.4) 

Combining (2.3) and 2.4 with Assumption 1 yields 

ll^llLm > («;?-p(l + 2co)r)||7'A(i?)||^ 



> ^^i^^||P^(B)|ir 



Thus, we get the result. 



3. General oracle inequalities for the spectral norm 

Define the random matrices 

Mi = -Vc.X„ M2 = - V(^o,X,) -E((Ao,X,)). (3.1) 

rj ^ — ' n ^ — ^ 



n '■ — ' n 

i=l i=l 
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We can now state the main result, which holds for general settings including 
in particular the three examples presented in the introduction. 

Theorem 1. Let Assumption 1 be satisfied with cq = 5 and rank(^o) < r. 
Then, the estimator (1.3) satisfies on the event A > 3||Mi + M2II00 



5 6V2 \ A 
6^ ll(a-l) j 14' 



|i"-Ao|U< I^ + ttt:^)-. (3.2) 



In [13], the authors obtained an oracle inequality for the Frobenius norm 
with an upper bound proportional to rank(Ao)A/Kj (with our notations), which 
trivially implies a suboptimal bound for the spectral norm since || • ||oo < |! ' lb- 
Under Assumption 1, we obtain a bound (3.2) that does not depend on rank(Ao). 
We will see in Section 4 that this oracle inequality gives the optimal rate for the 
spectral norm in the matrix completion problem. 

Proof. Note first that a necessary condition of extremum in the minimization 
problem (1.3) implies that there exists V G such that, for all A G 



2{A\A^ - A)l,^u) '{^l^2J^X^,A'-Aj+ A(y, A^-A)= 0. 

Set A = ^-^-Ao. It follows from the previous display that, for any U G M"'i><™2 
with \\U\\i = 1, 

|(A,(7)i,(n)| < |lMi+M2|U + ^. 
Thus we get, on the event A > 3||Mi + M2II00, for any U G R™ix™2 ^ith 

|(A,[/)i,(n)| < ^A. (3.3) 
Next, recall that the SVD of A = - An is 



A = ^ crj (A)4'^' ® f = rank(A). 

Take U — m^^'' ® v[^\ Then, we have 

(A,(7)i,(n) = (Pi(A),C/)L,(n) + (Pi^(A),C/)L,(n), 
where Pi and P^ denote the orthogonal projections onto Mi = l.s. 



and respectively. Combining the previous display with Equation (3.3) and 
Assumption 1 gives 

|(Pi(A),[/)^,(n)| < h + p\\U\\i\\Pi^iA)\\i < ^A + pl|Pi^(A)|li < ^A + pi|A||i. 



imsart-generlc ver. 2009/08/13 file: SNMC.tex date: October 26, 2011 



K. Lounici/ Optimal spectral norm rates for noisy low-rank matrix completion 8 

Lemma 1 yields on the event A > 3||Mi + M2II00 that 
||7'i„(A)||i<5||7'^„(A)||i, 

which imphes that A = — Ao G 'Cao.5- Combining the last two displays, we 
get on the event A > 3||Mi + M2II00 

|(Pi(A),;7)i,(n)| < ^A + 6V2rank(Ao)Hl7'Ao(A)|l2 



< -A + 6V2rank(Ao)pA*5(Ao)||A||i,(n). 
Theorem 2 in [13] with A = Aq gives on the event A > 3||Mi + M2II 



||A||i,(n) < AAi5(Ao)Vrank(Ao). 
Combining the last two displays, we get on the event A > 3||Mi + M2I 

I (Pi (A), I < ^A + 6V2rank(Ao)pM5(Ao)'A 



6 -I) ' 

where we have used Assumption 1 and Proposition 1 in the second line. 
Next, note that 

(Pi (A), [/)i,(n) - <Ti(A)|lC/|li^(n) > ^i(A)«;?||C/||2 = ai(A)«:?. 
Finally, combining the last two displays, we get the result. □ 

4. Matrix completion upper bounds with the spectral norm 

In this section, we apply the general results of the previous section to the matrix 
completion problem with i.i.d. sub-exponential noise variables. 

Assumption 2. There exist constants cr, ci > 0, /3 > 1 and c such that 

max Eexp f < 5, E£,f > ca^ , yi < i < n. (4.1) 

i=l,...,n y cr" J 

We need the following additional condition on ki and k'i. 
Assumption 3. There exist constants < ci < €[ < 00 such that 



<Ki< k[ < W (4.2) 
TO1TO2 V mim2 
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This assumption imposes that the probabihty to observe any entry is not too 
small or too large. It guarantees that any low-rank matrix can be estimated 
with optimal spectral norm rate (up to logarithmic factors). Indeed, when As- 
sumption 3 is satisfied, we can establish that the stochastic errors ||Mi||oo and 
||M2|joo are small enough with probability close to 1. 

Set TO = mi+TO2 and M = toiVto2. Denote the entries of by aQ{i,j), 1 < 
i < nil, 1 < .? < We can now state our main results concerning matrix 
completion. 

Theorem 2. Let Xi he i.i.d. with distribution II on X defined in (1-2). Let 
Assumption 1 be satisfied with Cq = 5 and rank(Ao) < r. Let Assumptions 2 
and 3 with, in addition, 2c'i < Mci. Assume that maxij- |ao(j, j)| < a for some 
constant a. For t > 0, consider the regularization parameter A satisfying 



A>C(aVa)max<' W /±M^ , ft + logM) log'^^l^i A to^) 1 ^^3^ 
(mi A m2)n n I 

where C > is a large enough constant that can depend only on a, /3, c, c, Ci, c'l . 
Then, the estimator (1.3) satisfies, with probability at least 1 — e^* 



ll^'^-^olloo < C"(crVa)TOiTO2max. 



t + log(TO) {t + \og{m)) log^/'^(mi A m2) 



(toi a TO2)n ' 



where C" > can depend only on a, (3, c, c, Ci, c'^. 



(4.4) 



Note that the technical condition 2c'i < Mci. is mild when M > 2 is large. 
Note also that when the noise variables are bounded, then this technical condi- 
tion is no longer needed since we can apply Proposition 2 instead of Proposition 
3 in Section 5 to control ||Mi||oo- 

Proof. This proof consists in applying Theorem 1 with a sufficiently large A such 
that the condition A > 3||Mi -I- M2II00 holds with probability close to 1. To this 
end, we need to control the stochastic errors |jMi||oo and |jM2||oo; see Lemmas 
2 and 3 in Section 5 below. Next a simple union bound argument gives for any 
A satisfying (4.3) that (4.4) holds with probability at least 1 — 3e~*, which can 
then be rewritten as 1 — e^* with a proper adjustment of the constants. □ 

Note that the natural choice of t is of the order log(TO). In addition, if n > 
Mlog^+^/^(m), then we choose A of the form 



X^Ciaya).U^, (4.5) 
y (mi A TO2)n 

where C > is a large enough constant that can depend only on a, /3, c, c, ci , c[ . 
We immediately obtain the following corollary of Theorem 2 
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Corollary 1. Let the assumptions of Theorem 2 be satisfied with A as in (4-5) 
and a large enough constant C > that can depend only on a, /?, c, c, Ci, c'j^, 
n > (toi V TO2)log^+^/'^(m). 

Then, the estimator (1.3) satisfies, with probability at least 1 — 1/m, 



\\A^-A4^<C'{aya)^TK^2\j^^^^^, (4.6) 

where C" > can depend only on a, (3, c, c, Ci, c'^. 

We prove now that the above result is optimal up to logarithmic factors by 
establishing a minimax lower bound. We will denote by inf^ the infimum over 
all estimators A with values in Pqj. g^^-y integer r < min(mi,m2) and 

any a > we consider the class of matrices 

A{r,a) = {^0 e R^ix™^ : rank(Ao) < r, max|ao(i,j)l < a] . 

For any A £ ]]j™ix™2^ denote the probability distribution of the obser- 

vations (Xi, Fi, . . . , X„,Y„) with E{Y,\X,) = {A, X,). 

Theorem 3. Fix a > and an integer r such that 1 < r < mi A 7712, Mr < n. 
Let the matrices Xi be i.i.d. with distribution II on X satisfying Assumption 3. 
Let the variables be independent Gaussian Af{0, a^), cP' > 0, for i = 1, . . . ,n. 
Then there exist absolute constants (3 G (0, 1) and c > 0, such that 



inf sup Pao( li^-^o II 00 > c(cr A a) VT^mT^I^W—) > /3. (4.7) 

The proof of this result can be found in Section 6 below. 

Comparing Theorem 3 with Corollary 1 we see that, in the case of Gaussian 
errors , the rate of convergence of is optimal (up to a logarithmic factor) 
in a minimax sense on the class of matrices A(r, a) . 



5. Proofs 



5.1. An intermediate result 

We need the following lemma to prove Theorem 1. 

Lemma 1. The estimator (1.3) satisfies, on the event X > 3(||Mi +M2II00) 

||7'i(i^-Ao)||i<5||7'^„(^'-^o)||i. 

Note that this result is an intermediate result in the proof of Theorem 2 in 
[13]. For the sake of completeness, we provide here a proof of this result. 



imsart-generic ver. 2009/08/13 file: SNMC.tex date: October 26, 2011 



K. Lounici/ Optimal spectral norm rates for noisy low-rank matrix completion 



11 



Proof. Note that a necessary condition of extremum in the minimization prob- 
lem (1.3) implies that there exists V G d\\A^\\i such that, for ah A g M™ix™2 

2{A\A^ - A)L,iu) -1-Y,Y,X,,A'^~a\ + A(y, A^ ~ A) ^ 0. 



Set M = Ml + M2. It follows from the previous display that 
2||i^ - ylollL(n) +\(V^V,A^- A^) = -\{V, A^ - A^) + 2(M, A^ - Aq), 

for an arbitrary V £ 9||^o||i- For the sake of brevity, we set Aq ~ X]j^=i '^j'^j®''-'] 

where r = rank(^o), Uj = Vj = wj^"^ and Sj = S'j(Ao), j = 1, 2. Then, V 

admits the following representation 



where W is an arbitrary matrix with ||VF||oo < 1- By monotonicity of the sub- 
differential of convex functions, (V — V, A^ — Aq) > 0. Therefore, we get 



KPkWPt,:A^ - Aa) < -X uj (g,Vj,A^ -Aoj + 2(M, - Ao). 

Set A = A^ — Aq. The trace duality guarantees the existence of a matrix W 
with 1 1 1 1 00 such that 

{PkWPi-^,A) = (VF,Pi;AAP4) = ||Pi;APi-J|i. 

The trace duality again implies that 



< \\Ps,APs,h. 



Combining the last three displays, we get, on the event A > 3(|lMi + 

||Pi-^APi-J|i<||P5,APsJ|i + ^i|A||i 



<^\\Ps,APs,h + '^\\P^^APi\\,. 



Thus we get the result. 



5.2. Control of the stochastic errors 



The following proposition is an immediate consequence of the matrix version of 
Bernstein's inequality (Corollary 9.1 in [22]). For the sake of brevity, we write 
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Proposition 2. Let Zi,...,Z„ he independent random matrices with dimen- 
sions mi X TO2 that satisfy = and \\Zi\\ < U almost surely for some 
constant U and all i = 1, . . . , 7i. Define 



{II 1 " II 1 " 

\\-Ye{z,z7) , \\-YEizJz,) 

II n ^-^ II n ^-^ 

1=1 1=1 



=1 

-t 



1/2 



Then, for all i > 0, mi/i probability at least 1 — e we /laue 

1 1 + log(m) t + log(m) 



^1 



< 2 max < az 



U- 



where m = mi + TO2. 



Furthermore, it is possible to replace the Loo-bound U on \\Z\\ in the above 
inequality by bounds on the weaker ?/'^-norms of \\Z\\ defined by 



U^J''^ =inf|u>0: Ecxp(||Z||^/u^) < 2|, /3 



> 1. 



Proposition 3. Let Z, Zi,...,Z„ be i.i.d. random matrices with dimensions 
mi X m2 that satisfy E(Z) = 0. Suppose that < oo for some f3 > I. Then 
there exists a constant C > such that, for all t > 0, with probability at least 



1 



Zi 



Zr, 



< C max < az 



t + log(m) 




where m = mi + m2 . 

This is an easy consequence of Proposition 2 in [11], which provides an anal- 
ogous result for Hermitian matrices Z. Its extension to rectangular matrices 
stated in Proposition 3 is straightforward via the self-adjoint dilation; see, for 
example, the proof of Corollary 9.1 in [22]. 

Lemma 2. Let the noise variables ^i,...,^„ be i.i.d. and satisfy Assumption 
2. Let X,Xi,...,Xn be i.i.d. with distribution H on X satisfying Assump- 
tion 3. Then there exists an absolute constant C > that can depend only 
on P, c, c, ci, c'l and such that, for all t > 0, with probability at least 1 — 2e^* we 
have 



Mill < Ccrmax ■ 



i-hlog(m) + log(m))log^/''(mi A 



"12 j 



(mi A m2)n ' 



(5.1) 



The proof of this lemma is essentially the same as that of Lemma 2 in [13[ 
up to some additional technicalities due to the fact 11 is no longer assumed to 
be the uniform distribution on X. We set Tr{i,j) = n(ei(mi)ej(m2)) for any 
1 < i < mi, 1 < J < m2. 
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Proof. Clearly, we have |jX|| — 1. Furthermore, under Assumption 3, we have 

(5.2) 



:{||E(X)||,|1E(X)^||} 



< 



Cl 



mim2 mi A m2 



mi A m2 



Indeed, Assumption 3 implies that 



< 



Cl 



miTO2 



< 7r(i, j) < —, Vi,j. 



mim2 



(5.3) 



Next, we have 



\E(X)\\ = max 

2^eR"2:ta;|2 = l 



Note that the maximum is clearly achieved at point x satisfying Xj > for any 
1 ^ J ^ m2 since ■n{i,j) > for any i,j in view of the two above displays. Thus, 
we get 



l|E(X)|| < 



77I1TO2 x£M."^2:\x\2 = 



< 



< 



max 

TOim2 a;6K'"2:|2;|2 = l 



iim2 



miTO2 



where we have used successively Cauchy-Schwarz's inequality, \x\2 = 1 and 
J2i j = 1- Similarly, We obtain the same bound for ||E(X)^||. 

We have 



aj( = max ■ 



max > 7r(i, 7) , max > tt( 
l<i<mi 1 ^ V 'J/ I 'i<j<,„2 ^ 



Combining the above display with (5.3) yields the second part of (5.2). 

Next, observe that for X ~ X ~ E(X), we have in view of (5.2) and the 
technical condition 2c[ < Mci that 



Cl o 2c'i 

< cr| < — ■ 

2mi A m2 mi A m2 



(5.4) 
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Indeed, this follows from the easy fact 



xx')\\-mx)\m X) 



< mixx 



< mixx 



combined with (5.2) and Assumption 3. We proceed similarly for 
Now, 



\\E{x)m{x) 

x^xw. 



IIMil 



< 



< 




(5.5) 



Set Zi = [Xi — EX). These are i.i.d. random matrices having the same dis- 
tribution as a random matrix Z. Since = 1 we have that \\Zi\\ < 2\^i\, and 
thus Assumption 2 implies that C/^'^'' < ca for some constant c > 0. Further- 
more, in view of (5.4), we have az < C2cr/(mi hra^Y^'^ for some constant C2 > 
depending only on c'^ and az > c^<^/{nii A 1112)^^^ for some constant C3 > 
depending only on ci , c. Using these remarks we can deduce from Proposition 
3 that there exists an absolute constant C > such that for any t > with 
probability at least 1 — e^* we have 



1 " 

- V^,(X,-EX) 

rj ^ — ^ 



< C max < az 



< Ca max ■ 



t + log(TO) ^(^) 



log 



t + log(m) 



CTZ 



t + log(TO) {t + \og{m)) log^/'^(mi A 7712) 



(mi A m2)n 



Finally, in view of Assumption (2) and Bernstein's inequality for sub-exponential 
noise, we have for any t > Q, with probability at least 1 — e~*. 



1 " 



< Ca max 




where C > depends only on c. We complete the proof by using the union 
bound. □ 

We now treat ||M2||. 

Lemma 3. Let X,Xi, . . . ,X„ be i.i.d. random variables with distribution 11 on 
X satisfying Assumption 3. Assume, in addition, that max^^ |ao(i,j)| < a for 
some a > 0. Then, for all t > 0, with probability at least 1 — e~* we have 



M2II < 2cia max ■ 



t + \og{m) 2{t + log(m)) 



(toi a 7712)71. ' 



(5.6) 
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Proof. We apply Proposition 2 for the random variables Zi = ii{Al Xi)Xi — 
E(tr(^4([X)X). Using (5.2) we get ||Z,|| < 2max,,j |ao(i,j)| and 

4 < max{ ||E((Ao,X)2xxT)||, \\¥.{{Ao, X)^ X)\\ } < . 

^ ^ ' ^ ' mi A 7712 

Thus, (5.6) follows from Proposition 2. □ 
5.3. Proof of Theorem 3 

Proof. We assume w.l.o.g. that M ~ mi V 7712 = 7771 > 7772. The idea is to adapt 
to our context Theorem 5 in [13]. Note that Theorem 5 is established under a 
restricted isometry condition in expectation (See Assumption 2 in [13]). A quick 
investigation of the proof shows that the conclusion of this theorem is still valid 
for Xi, . . . , Xn i.i.d. with distribution 11 satisfying Assumption 3. Indeed, we 
then have for any A € R™ix™2 

^^11^112 < PliLm) < -^WMl (5.7) 

77717772 "^"^ "-^^^^^^ " ?77l 7772 

Recall that ]13] established in the proof of Theorem 5 the existence of a subset 
A° C A{r, a) with cardinality Card(yl°) > 2""^^^^ + 1 containing the zero 7771 X 
7772 matrix and such that, for any two distinct elements Ai and A2 of A'^, 

j^iaA af^^ < II - ^2(1^ < ^^a A a)^^-^. (5.8) 

Next, using (5.7) instead of Assumption 2 in [13[, Equations (4.3) and (4.4) 
in [13] are replaced respectively by 

||Ai-A2|li,(n)>ci^(aAa)2^, (5.9) 

and 

A-(Po,Pa) = ^PIlL(n) <c'i-^mir, (5.10) 

where K(Po, Pa) is the KuUback-Leibler distance between Pq and P^ and 7 > 
is some numerical quantity introduced in the construction of the set A^ in [13]. 
For any two distinct matrices Ai, A2 of A*^, we have 



Pi - ^2||oo > A a)^ (5.11) 

Indeed, if (5.11) does not hold, we get 

Pi - A2||L(n) < ^^rank(Ai - ^2)^1 - A^Wl < ci^(a A a f-^, 
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since rank(Ai — A2) <rhy construction of J\P in [13]. This contradicts (5.9). 

We now take 7 > sufficiently small depending only on c'^ ,ci,a with a > 
so that 
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